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1. INTRODUCTION 

In discriminant analysis, often a tMO-step procedure Is followed; first, 
training samples are obtained to set up a discriminant rule and then. Indiv- 
iduals are classified using the sample-based rule. However, If the criterion 
for assigning the training samples to their true classes Is Imperfect, soaie 
training samples may be misallocated. For example, this arises In discrimi- 
nation of crops In an area using spectral data acquired from a satellite. 

The scene Image of the area Is analyzed to delineate crop features and train- 
ing samples are assigned crop labels based on visual Interpretation of their 
spectral observations. This can lead to mislabeling of crops for some training 
samples and thus, may adversely affect the performance of a discriminant rule. 

Presently we &tudy the linear discriminant analysis In the presence of mls- 
a 1 location in a training set. Suppose that individuals come from one of the 
two classes and € 3 . A p-d1tnensional random vector X is measured on each 
individual. It is assumed that X has the multivariate normal distribution with 
mean and covariance matrix I for C^, 1*1,2. In a training sample of n 
individuals, suppose nj are allocated into and n 2 *n-n]^ into € 3 . If af 
is the fraction of training samples from Ci that are misallocated, 1 * 1 , 2 , the 
two samples of sizes n^ and n 2 represent mixed classes, say and C^, 
instead of the original classes Cj and C 0 . Let and and S* denote 
the sample means and the pooled san^)le covariance matrix, respectively. Then 
a random observation X can be classified on the basis of linear discriminant 
function (Anderson, 1958) given by 

^(?) * ? 


( 1 . 1 ) 
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The classification procedure Is to regard the observed value, X coming from 
Cx or C 2 according as the discriminant value, x(X) < 0 or > 0, respectively. 
Then the error rates for the procedure are g ven by 

Rl . Prob 1X(X) > 0 ! X tCj, Tj, Tg*, S*} 

Rg . Prob {x(X) < 0 I X eC 2 , ij. Tg*, S*} (1.3) 

4 

and Its average error rate 1$ given by 

R ■ »1 Rl + «2R2 (1»4) 

where «x and «2 probabilities associated with and C 2 . 

Assuming that training samples are randomly misal located, Lachenbruch (1966) 
and McLachlan (1972) studied Rx and R 2 for their expected values and variances. 
However, a random misallocatlon model Is unrealistic, particularly If the ob- 
servation X Is Itself used In determining the allocation. Lachenbruch (1974) 
suggested a non-random allocation model with two variations to It. His cri- 
terion for allocation was based on the distances of an observation from the 
class means. Presently, we propose an allocation model In which misallocatlon 
of a sample depends upon Its observation. The random and non-random mis- 
allocatlon models of Lachenbruch become special cases of this new model 
(Section 2). 



For the discriminant function In (1.1)* we give the asymptotic distribution of 
the discriminant boundary and obtain the asymptotic mean and variance of each 
of the error rates. Rj* R 2 » and R (Section 3). We take the same approach 
that was used by Efron ^97S) and extend his normal discrimination results 
to the case of misal located training samples. The present study can also be 
viewed as an extension of Sayre (1980) who gives the asymptotic distribution 
of R assuming correct allocation for the training sa.iip1es; although we here 
do not expllcity give the distribution. NcLachlan (1972) has given the asymp- 
totic means and variances of the error rates for random ml sal location* but his 
derivation Is limited to only one of the two misallocatlon rates being non-zero. 
Lachenbruch (1966, 1974) Investigated the means and variances of R^ and R 2 
for his models using simulations. Nichalek and Tripathi (1980) discussed the 
problem for random misallocatlon, but they stuoled the discrimination between 
the mixed classes and not between the original classes. Given In Sections 4 
and 5 are certain numerical results showing the adverse effect of misallocatlon 
on the linear discriminant boundary and the associated error rates. 

2. MISAUOCATION MODELS 


Suppose a^»(uj-u 2 )' (wi-U2)* means of linear transformations, 

one can reduce the class structures In the canonical form (Efron 1975), where 



( 2 . 1 ) 


so that the class means and ^2 all^ried along the xi-ax1s. Suppose 
allocation of an Individual Is made using Its observation X. It Is desirable 
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to consider an allocation so that chance of Misallocatlon for an Individual 
Increases as Its observation deviates further away from the mean of Its true 
class In the direction of the mean of the other class. So let the probability 
of misallocatlon of an Individual from Into C 3 _{ be 9 i(z), 1 > where 
gi(z) Is a monotone Increasing function and 92 (z) 1^ a monotone decreasing 
function with z to be along the xi>ax1s. Suppose fi(z) Is the frequency func> 
tion of the first component of random vector X for and 

»1 ■ /“ f^(z)dz, 1 - 1 , 2 . 

Define the misallocatlon rate ai by 

oi • (l/*i) r 9 i(z)f^(z)dz, 1 - 1 , 2 . ( 2 . 2 ) 

-me 

Given ox and 02 * the functions gx and 92 can be specified differently. 

The ranaom misallocatlon model (Lachenbruch 1966, McLachlan 1972, Michaleic 
and Tripathi 1980) corresponds to the uniform case given by, and to be called 
model (a): 

(a) Random Misallocatlon 
For X e Ci, let 

gi(z)-ai, 1-1,2. (2.3) 

Another model, to be called model (b), is obtained by specifying gx and 93 
as follows: 

(b) "Truncated" Model : 

For X e Cx, let 

(0 , z < ax 

9l(z) » 

( U , z > ax 
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and for X e let 

u , z < 02 
0 , z > 02 

where a^ Is determined from (2.2). After solving It, we obtain 

ai « -(A/2) + Zi.a /u 
1 

02 ■ A/2 ♦ Za /g 
2 


92(z) 


.) 


( 2 . 4 ) 


where ly denotes the Y-perrentage point of the standard normal distribution. 


If we assume u*l and ai<ia2*0» then one obtains the complete separation model 
of Lachenbruch (1974). His other non-random model can be obtained by taking 
the ai as percentage points of the chi square distribution with p degrees of 
freedom. 


Though models (a) and (b) are easy to Implement and hence, these are appealing, 
they may not be always suitable. Instead, It may perhaps be more appropriate 
to let the probability of misal location Increase as the observed value deviates 
away from the mean of Its true class. Cne such model can be defined as follows: 

(c) Exponential Model: 

For X e Cj, let 

( 0 , z < -A/2 

9l(z) ■ I « 

( 1-exp (-kj[z+A/2]V2), z > -A/2 
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and for X c C 2 » lot 

i 1-exp (-ko[2-A/2]^/2), 2 < t/Z 

0 . 2 > 4/2 (2.5) 

Mhere Is determined from (2.2). It easily follows that 

k, . (l-2a,)-2 . 1. 

In practice* the misal location rates ai will be subject to sampling vari- 
ation. Hence* these rates are being considered as random variables. 

In Appendix A* we derive the mean vectors and the covariance matrices of the 
mixture distributions of C* and C 2 * and 1n*sect1on 3* we give the dis- 
criminant analysis for arbitrary functions gx and g 2 as defined earlier. 

For numerical computations presented In sections 4 and 5* we consider the 
special cases, models (a)* (b) and (c)* and convare the performances of the 
discriminant rule associated with the discriminant function In (1.1) for 
these models. 


3. DISCRIMINANT BOUNDARY AND ERROR RATES 

When the parameters are known* the discriminant rule Is: classify X Into 
Cl If X(X) < 0 and Into C 2 , otherwise* where 

• §0 ♦ r X (3.1) 

• 1 o 9 (* i /* 2 ) " ^**21 “ ^ 11 ^/^ 


®1 " (**21 “ ** ll )/(^ 
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As discussed by Efron (1975), the "OptlnMim" boundary, x(X)>0, Is the 
(p«l)-d1mens1ona1 plane orthogonal to x^-axls and Intersecting It at 

T - -0o/®l* (3*2) 

For large sample size n, the sample-based boundary, \(X)«0. Is the plane 
intersecting the xi-axis at t > t-kIt with normal vector at an angle de 
from the x^-axis, where dv and de represent small deviations from 0. With 
no loss of generality, suppose t> 0. Then the distances of uj and ^2 
from the optimum boundary are 

Di ■ A/2 + t, D 2 ■ A/2 - T, (3,3) 

and those from the sample-based boundary are 

di » (Oi+dt) cos(de), d 2 *(D 2 -dT) cos(de). (3.4) 

Refer to Efron(1975) for a pictorial description of the two-discriminant 
boundaries and other related details. 

The error rates can now be written in terms of these distances: 




0 •j“2i»e*fP 


and X2* ^11 • ^21 ^ defined In Appendix A. 


» ♦(-Dj) , R^ ■ 4(-D2) 


(3.5) 




» 
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for thf "optimum" boundary* and 

Rl«a(-d), R 2 -*(-d 2 ) (3»6) 

for tha san^la-basad boundary* where a stands for the standard normal cdf. 
Let a denote the density function of standard normal. Then* Ignoring higher 
than second order differential terms* we have (Efron* 1975) 

Rl " R? - a(Di)dt ♦ (0i/2) a(0i)[(dv)^ ♦ (d*)^] 

Rg • Rj ♦ a{D2)<‘^ ♦ (V2) a(02)L(dT)^ ♦ (d€)^] (3.7) 

where 

dT ■ -(dpo * Tdei)/8i * 

(de)^ ■ C(<162)^ ^ •••• (<l8p)^3/'8l^ (3.8) 

with dpj » (Bj - Bj) denoting the error In the estimate Bj* 
j ■ 0,1, 2*. ..*p, given In (1.2). We denote d|^^^ •(dflj* dB 2 ***«»<lBp)'’ • 


Since n Is large, one may assume that /rT(dBo* * limiting normal 

distribution with mean 0 and covariance matrix Vg. In Appendix B* we obtain 
V| and write it in the form, 




‘oOO 

OOl 

9 ' 

oOl 

on 

O' 

. 0 

0 

0 22i - 


with quantities oqo* oqI' ^11 ^22 expressed In terms of basic 

Input parameters, ii* x2> ^2 among others. 



It follows from (3.8) that 
0^2 • £[(dt)23 

• (oqo ♦ 2tOQi + t 2 oji)/e{. (3.10) 

Suppose we define 

dwj • dBj/P][t J*2.3«...p. 

Then Its variance Is 

• 022/®f» J“2.3,...»p. (3.11) 

J 


Next. /n(dt. dtt) has a limiting normal distribution with mean 0 and 
covariance matrix TyiJ', where 


I - (l/6i) 



The covariance matrix n;ay be written as 



where 022 / 0 ? • 


Since (dc)2 • (dwj)^ and n(dwj)^/0j^2 ~ xf. J“2.3 

n(de)^/Oj^^ Xx-i* Furthermore, n(dT)^/o^^ ~ x^l* 

(The symbol ~ should read "asymptotically distributed as".) 
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From (3.7) and the abovt distributional results, the asymptotic moments of 
the error rates can now be easily obtained. Since (dT}^ and (de}2 arr 
asymptotically uncv^related and 

EC(<1t) 23 . E[(de)2] - (p-1) oj/t, 

and 

V[(dT)2] . 2o//n2, V[(de)2] - 2(p-l)o//n2. 

the asymptotic means of R], and R 2 , Ignoring second and higher order terms, 
are given by 

E[Ri] • rJ + (0i/2n) t(Di) [o^2 ^ {p.i)aj2 

ECR 2 ] - 4 * ♦(O 2 ) Co/ + (P-l)o/] ‘(3.12) 

For the asymptotic second order moments. Ignoring third and higher order 
terms, we have the variances and covariances of R^ and R 2 as follows: 

V[Ri3 - (l/n)f2(0i) (o/ + (0{/2n)Co/ + (p-l)o/]} 

V[R2] • (l/n)f2(02) (o/ + (0|/2n)[o/ + (p-l)o/]} 

CovtRi, *^23 • (l/n)f(0i)^(02)l-o/ ♦ (0i02/2n) 

[ 0 / + (n-l)o/]}, (3.13) 

where 0 ^^ and are functions of elements of Bq, Bj and Vg. 

Clearly, E[R^] approaches R?, 1*1,2, as n becomes Infinite. 

For the average error rate, we have 

E[R] - rO + (l/2n) [KjOi ♦(Oj) + 1 . 2 O 2 *( 02 )] [ 0 / + (p-1) 0 /] 

V[R] ■ w/ V[Rj3 + *2^ V[R2] + 2 *2 Cov(Rj, R 2 ), (3.14) 

where V[Rj], V[P. 2 ] and CovCRj, R 2 ] are as given In (3.13). 
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4. NUMERICAL RESULTS 

Conputatlons wer« made to evaluate the asymptotic covariance matrix Vg 
for following cases of Input p&.'ameters: 

■ .St .7 

A - 2. 4 

01 ^ * 0. *lt .2t .St .4 and ^2 * ® 

This was done for all three misallocatlon models discussed In section 2. 

We specified u >.5 In model (b)t equation (2.4)t so that there Is a fifty- 
fifty chance of misallocatlon for an observation that falls beyond a thres- 
hold point. Based on these computatlonst we obtained Tt o^^t and the 
means and variances of the error rates given In equations (3.12)t (3.13) and 
(3.14). Table 1 lists the values of t* and From these numerical 
results, we find that Increases as Increases from 0 to .4, except 
there Is a slight decrease when A«2, wx*«7 and model (c) for misallocatlon. 

The results for are mixed; It Is constant In the case of misallocatlon 
mooel (a) and It decreases as oi Increases for models (b) and (c), provided A«2. 
When a* 4. It first decreases and then Increases. 

The values of i.nd 9 ^ are considerably higher for model (a) than for 
other two models. This Is an expected result because the boundary Is subject 
to higher variability under random mixing In training samples. Next, the 
rate of Increase In as a function of oj Is higher for A«4 than for 
a= 2. Again, this Is expected since a higher rate of misallocatlon In 
training samples will lead to a larger change In the variance of a mixture 
distribution when Ci and C 2 are more separated and, hence, causing a large 
Increase In 
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1* Values c 


(“ 1 . « 2 ) 


T and Variances Associated With the Sa^le-Based Boundary 


»i ■ «s 

NisaiiocatiM nooei 
(a) (b) (c) 


m sal location nooei 
(a) (b) (c) 


T 


( 0 . 0 ) 
(. 1 . 0 ) 
(. 2 . 0 ) 



( 0 . 0 ) 
J. 0) 
. 2 . 0 ) 
.3, 0) 

.4, 0) 



(. 1 , 0 ) 
(. 2 , 0 ) 
(.3, 0) 
(.4, 0) 


( 0 . 0 ) 
(. 1 . 0 ) 
T (.2, 0) 

(.3, 0) 
(.4. 0) 


0 


2 

T 


( 0 . 0 ) 
(. 1 . 0 ) 
(. 2 , 0 ) 
(.3, 0) 
(.4, 0) 



( 0 , 0 ) 
(. 1 . 0 ) 
(. 2 . 0 ) 
(.3, 0) 
(.4, 0) 


( 1 ) 4-2 


0 

0 

0 

.424 

.424 

.424 

-.221 

-.192 

-.191 

.214 

.092 

.089 

-.491 

-.398 

-.375 

-.074 

-.167 

— . 186 

-.819 

-.649 

-.565 

-.463 

-.443 

-.435 

-1.218 

-1.001 

-.776 

-.976 

-.815 

-.681 

1.000 

1.000 

1.000 

1.360 

1.360 

1.360 

2.157 

1.1.<6 

1.130 

1.929 

1.308 

1.235 

4.327 

1.541 

1.184 

3.475 

1,717 

1.216 

8.248 

2.473 

1.211 

7.088 

2.542 

1.133 

15.549 

5.373 

1.296 

15.564 

5.178 

1.072 

2.000 

2.000 

2.000 

2.190 

2.190 

2.190 

2.000 

1.068 

1.051 

2.190 

.845 

.824 

2.000 

.747 

.533 

2.190 

.488 

.286 

2.000 

.644 

.248 

2.190 

.387 

.074 

2.000 

.773 

.098 

2.190 

.515 

.005 




» 4 



0 

0 

0 

.212 

.212 

.212 

-.277 

-.257 

-.257 

-.065 

-.145 

-.147 

-.617 

-.553 

-.535 

-.413 

-.477 

-.493 

-1.034 

-.916 

-.847 

-.874 

-.860 

-.853 

-1.546 

-1.398 

-1.207 

-1.483 

-1.373 

-1.248 

1.000 

1.000 

1.000 

1.193 

1.193 

1.193 

2.741 

1.497 

1.480 

2.236 

1.751 

1.703 

5.821 

2.653 

2.012 

4.558 

2.756 

2.182 

10.948 

4.976 

2.642 

9.464 

4.765 

2.623 

19.324 

10.391 

3.494 

18.998 

9.885 

3.125 

1.250 

1.193 

1.193 

1.298 

1.193 

1.193 

1.250 

.511 

.497 

1.298 

.324 

.309 

1.250 

.266 

.122 

1.298 

.094 

.009 

1.250 

.194 

.002 

1.298 

.044 

.077 

1.250 

.285 

.055 

1.298 

.109 

.343 
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If M« consider the complete seperetlon model, 1.e.» u«l, or the ether non- 
rendom model of Uchenbruch (1974), the mixture distributions will have smeller 
verlences then the original distributions have. As such, the variance 
may be smaller as compared to the case of no misal location allowed In samples. 
In turn, this may lead to smaller values for the expected error rates, as It 
was observed by Lachenbruch In his sampling study* His study was, of course, 
restricted to the linear discriminant function without the term of 
log or Its estimate log nj/n2 as may be the case with respect 
to the discriminant boundary, optimum or sample-based. 

In Table 2, we present the asymptotic expected values and standard deviations 
(SO) of Ri, R2, and R corresponding to «x>.5, a> 2, p>2 and a| and a2 
as considered In Table 1. Similar results can be easily computed for the other 
cases by making use of the values of t, and from Table 1. It Is seen 
that E[Rx] and S0[Rx] increase, whereas ECR2] and S0[R2] decrease as aj 
increases. When a^>0 and q2>0, vxA 2 ^ ax~a2>0 

and hence, the discriminant boundary shifts away from 112I ^He direction 
of uxi ‘’‘I Increases, causing the error rate to increase for Cx and 
to decrease for C2. For the average error rate, E[R] and SD[R] increase as 
the misallocation rate ax increases. Thus, th^re is an adverse effect on 
the average error rate R due to misallocation of samples from one class to 
another. 

In limit, E[R^]>R^, i>l, 2, and E[R]>R^ as n becomes infinite. The 
values of R^, R^, and R® obtained for n* • are also given in Table 
2. The corresponding standard deviations are, of course, zero. 


rwmmm Wm 
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2. Asymptotic Means and Standard Deviations of Rj» R 2 A«2, p«2) 


n^lOC n» • 

Wsal location Model msaiiocacion wooei ^ 

(a) (b) (c) (a) (b) (c) 


ECRi ] 


( 0 . 0 ) 

UU 0 ) 

I2 * 0 ) 


ECR2] 


E [ R ] 


( 0 , 0 ) 
(. 1 . 0 ) 
(. 2 . 0 ) 
(. 3 . 0 ) 
(. 4 . 0 ) 

( 0 . 0 ) 
(. 1 . 0 ) 
(. 2 . 0 ) 
(. 3 , 0 ) 
(. 4 . 0 ) 


( 0 , 0 ) 
(. 1 . 0 ) 
S 0 [ Ri 3 (. 2 , 0 ) 
(. 3 . 0 ) 
(. 4 . 0 ) 

( 0 , 0 ) 
(. 1 . 0 ) 
SD[R2] (.2, 0) 
(. 3 , 0 ) 
(. 4 . 0 ) 

( 0 . 0 ) 
(. 1 , 0 ) 
SD [ R ] (. 2 , 0 ) 
(. 3 , 0 ) 
(. 4 , 0 ) 


.162 

.162 

.162 

.223 

.212 

.212 

.311 

.276 

.268 

.432 

.365 

.333 

.594 

.500 

.412 

.162 

.162 

.162 

.116 

.119 

.119 

.074 

.084 

.086 

.042 

.052 

.060 

.020 

.026 

.039 

.162 

.162 

.162 

.169 

.166 

.166 

.193 

.180 

.177 

.237 

.208 

.197 

.307 

.263 

.225 

.025 

.025 

.025 

.044 

.031 

.031 

.073 

.041 

.036 

.113 

.059 

.040 

.154 

.092 

.044 

.025 

.025 

.025 

.028 

.021 

.021 

.028 

.019 

.017 

.023 

.016 

.013 

.016 

.013 

.009 

.004 

.004 

.004 

.010 

.006 

.006 

.024 

.012 

.010 

.046 

.022 

.014 

.070 

.040 

.017 


.159 

.159 

.159 

.218 

.210 

.209 

.305 

.273 

.266 

.428 

.363 

.332 

.586 

.500 

.411 

.159 

.159 

.159 

.111 

.117 

.117 

.068 

.081 

.085 

.034 

.050 

.059 

.013 

.023 

.038 

.159 

.159 

.159 

.165 

.163 

.163 

.187 

.177 

.175 

.231 

.206 

.195 

.300 

.262 

.225 


4 


?;■ 
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5. SHALL SAMPLE RESULTS 

Because of complex algebric expressions Involved In the evaluation of Vg, 

Me conducted a Monte Carlo sampling experiment to check the accuracy of 
asymptotic results as well as to study the error rates when the training sample 
size is small. Normal random numbers were generated using the technique of 
Box and Muller (1958). The simulation study was limited to p«2> a >2, 4, 
and n«20, 50. 100. The numbers of training sanq)les from Cj and C2 were taken 
to be proportional to their a-priori probabilities. Though there were 
many other cases, we have chosen to give here the results for the case of *x**^^> 
aj«.087. a2*.226 (this is equivalent to *i*.7. oj».l and 

in terms of mixed classes). a>2. Table 3 presents the means and standard devia- 
tions of Ri and 83 for n«20. 50, 100 obtained from the sampling experiment 
as well as from the theoretical results given in (3.12) and (3.13). 

Besides misallocation models (a), (b), and (c). we also consider the case of 
no misallocation in training samples, i.e., oi=a2*0. This is listed as model (0) 
in Table 3. Based on these and other results, we find a good agreement between 
the sampling and asymptotic results. When n*100, the two sets of values of 
E[Ri], E[R2 ], SD[Ri] and SD[R2] agree at least up to second decimal place. 

Moreover, the agreement holds quite well even for small sample size of n*20. 

A comparison between the results for model (0) and of other three models shows 
that misal locations under models (b) and (c) lead to about the same results 
that are obtained with no misallocation in training samples. The actual error rates 
are considerably biased and have much larger variances with random misallocation. 
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3. The Means and Standard Deviations of and R 2 




(tl».69, 

aj^B.087 

a2**E26» 

A>2» p«2) 




Parameter 


Sampling 



Asymptotic 


MlsalUcatlon Model 

No 

Ml sal lo- 
cation 

Ml sal location Mo'^^l 

RoT 

Mlsallo- 

catlon 




to) 




— 1ST 




111 

n-100 





E[Rl] 

.044 

.081 

.090 

.078 

.046 

.082 

.086 

.081 

ECR 2 ] 

.434 

.286 

.267 

.291 

.416 

.280 

.268 

.286 

S0[Ri] 

.023 

.021 

.022 

.017 

.027 

.021 

.019 

.017 

SOCR 2 ] 

.118 

.048 

.047 

.042 

.117 

.047 

.040 

.040 




(11) n-50 





E[Rl] 

.057 

.085 

.087 

.084 

.055 

.084 

.088 

.085 

E[R2] 

.423 

.291 

.276 

.289 

.421 

.282 

.269 

.289 

SDCRi] 

.036 

.029 

.025 

.025 

.040 

.030 

.027 

.025 

SD[R2] 

.143 

.072 

.051 

.054 

.166 

.067 

.057 

.056 




(ill ) n«20 





E[Ri] 

.083 

.095 

.096 

.089 

.079 

.091 

.094 

.096 

E[R2] 

.482 

.323 

.295 

.323 

.435 

.289 

.275 

.299 

SDCRi] 

.112 

.060 

.043 

.044 

.074 

.049 

.044 

.042 

SD[R2] 

.237 

.137 

.115 

.101 

.264 

.106 

.090 

.090 
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So» If in allocation proctdurt for training saiig>lts Is foimulatod basad on tht 
concept underlying models (b) and (c)» the effect of misal location on the 
linear discriminant analysis for two classes can be minimized. 


6. C0NCLU0IN6 REMARKS 

In practice. c*>log «i /«2 o'' Its estimate, as may be the case. Is not In- 
cluded In the discriminant boundary. This leads to what Is sometimes referred 
to as the Fisher classification rule. Otherwise. It may be called the Bayes 
classification rule. To study the difference In the error rates caused by 
the exclusion of c*>log ni/n 2 from the discriminant function as given In (1.1). 
we obtained the means and standard deviations of Rj and R 2 for each rule. The 
results are presented In Table 4 for tie case of ii>.69. ai".087. a2"*226 
and n>100. Results are also given for the case of ai*0.o2*0. 

Since simulation and asymptotic results are almost same when n«100. either of 
two sets of results can be considered. We have listed In Table 4 the results 
obtained by the Monte Carlo method. 

A comparison betvieen the results of misal location models (a), (b). (c). and 
those of no misal location model ( 0 ) shows that the means and standard deviations 
of Ri and R 2 « and hence, of R. are less affected due to misal location In the case 
of Fisher rule than for the Bayes rule, particularly when misal location Is 
random. This difference Is more In the case of A«4. Since wj».7. and 
«j>.69. «i/w 2 Is approximately equal to «^/« 2 * So the ratio nj/n 2 
can be considered an equally good estimate of «i/«2> snd thus, hardly Intro- 
duces any additional shift In the discriminant boundary, otherwise obtained 
from the correctly allocated samples. However, when the two ratios, 
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4. The Ntens and Standard Deviations of Ri and R? for Fisher and Bayes Classi- 
fication Rules 

(v|b. 69, ai«.087, o 2 >. 226 , p«2» n«100) 


Fisher Bayes 

No — ^ m 


Parameter 

Ml sal location Model 

Mlsallo- 

cation 

Ml sal location Model 

Ml sal lo- 
cation 


(a) 



(0) 








liL 

A-2 





E[Ri] 

.189 

.ISO 

.147 

.158 

.044 

.081 

.090 

.078 

mzi 

.148 

.175 

.179 

.166 

.434 

.286 

.267 

.291 

SD[Ri3 

.034 

.023 

.024 

.027 

.023 

.021 

.022 

.017 

SDCRg] 

.031 

.026 

.028 

.027 

.118 

.048 

.047 

.042 




liil 

A«4 





E[Ri] 

.039 

.028 

.026 

.024 

.007 

.010 

.011 

.015 

E[R£] 

.019 

.023 

.025 

.024 

.112 

.061 

.057 

.038 

SOCRi] 

.015 

.008 

.008 

.006 

.006 

.005 

.005 

.004 

SD[R2] 

.008 

.007 

.008 

.006 

.066 

.021 

.020 

.011 
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and «i/«2 same, the shift due to the Inclusion of log nj/n2 

In the discriminant function may become considerable and hence. It may cause 
higher bias as well as higher variance for an error rate. Thus, unless the 
allocation procedure for the training samples Is objer.tively formulated as 
reflected In our models (b) and (c), the use of Fisher rule may be preferred 
over the Bayes rule because of Its robustness property. 
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APPENDIX A 

Mixture Distributions of and C 2 * 


Ue obtain parameters of the tuo mixture distributions by expressing these In 
terms of jj 2 . s, Oj and og. First ue obtain these parameters by consider- 
ing the orginal class structure and then give these parameters for the case of 
canonical form. 

Without loss of generality, let jij and jj 2 ^ aligned along the x^-axls and the 
conditional means In other dimensions, given X|, be 

* “IJ * ^*1 “ “ 11 ^* J • 2» 3, •••, p (A-l) 

2 

for e C^, 1-1, 2. Suppose 0 denotes the common variance of the two 
distributions for Xj, the first component of random vector X. Let )i| 
and xt denote the mean vector and covariance matrix for C, , 1 - 1, 2. The 
frequency function of Xj for can be written as 

f*{z) - [1 - g^(2)] f^(z) + g 3 _^(z) f 3 _^(z) (A-2) 

where g^(z) and f^(z), 1 ■ 1, 2, are as defined in section 2. 


Then the probability associated with Is 


rj . r f^Cz) dz 


(1 - o^) + <*3.iX3.i. 1 - 1. 2 


* * 

and *1 *2 * 


Define 




z - u 


11 


( — 5 — ^)g^(z)F^(z)dz 


(A-3) 


original page J i 
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and 


'17 




z ~ u 


11 


( g ) 9 ^ ( 2 )^^ (i)dz» 1 ■ 1» 2. 


Nom the elements of u^. 1 - 1> 2. are obtained as follows: 


Fop 


‘ii • 7 / 


(z)dz 


It follows from (2.2) and (A-2) to (A-4) that 


* * 


Similarly 


»lUll ■ ™io) ♦ *2®2^*'21 * " 2 ®^ 


ijUii + ’' 2 ® 2^“21 ' ‘' 11 ^ ^* 2 “ 2™2 " * 1 ®!"*!)®* 


’' 2“21 * ’^ 2‘'21 " ”l“l ^^21 ■ “ 11 ^ “ {’' 2 ® 2"*2 " *i®i*"i)®* 


For j * 2, 3, p, we have 


V*J */ “ijizi:' - 


Making substitutions from (A-1) and simplifying It, we get 

'i“ij • ’l"lj * '2°Z (“ 2 J • "U> * ''jl' 2 “ 2*2 ■ MVl*”' 


’^ii ’ '^2J ■ 'l”! <"2J ■ “1J> - ’'j <’2“2"2 - 


(A-4) 


Similarly, 


Let 
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«j ■ W2j - “ij* J ■ 2# •••» P 

t ■ I 

* * * / ^ 

*1 “ ’' 2 * 2 ^*! *2 " * 1 ® 1 ^* 2 * 


Then 

“ij “ **U * “l 

**2j “ '*2j • *2 *j “ ®'*2 

j«l, ••». P 

where ■ !• Another form of (A-6) that will be used in 

* * 

covariance matrices and E 2 1s! 

“u “i> ‘j * "j' 

“ii ■ "U * ** ‘ * 2 ' ®J ■ ’J* 


* 

Next, the covariance matrix for C^, 

l\ • E^CU - feI)U * “1)"^ 


OfitQiriAi, PAU. .b 
OF POOR QIMUTV 


(A-5) 


(A-6) 


the derivation of 


(A-7) 


can be written as 





/ [*iii * (*ui • *i><iii|i - 

im 

’ f [*JI« * '»2|» ■ 




(A-8) 


Mhers )i ||2 and are the conditional mean vector and covariance matrix of J( 

given z. This easily follows from the conditional expectation argument. The 
elements of 1 “ 1. 2, are given In (A-1) with xj replaced by z. Letting 




and making substitutions from (A-1), (A-6) and (A-7) In (A-8), It can be shown 


£l • £**' ®i(l - ®i) 55 ♦ (»i + Xj) YX o^/*i 
♦ ♦i(£t +X 1 )«Ai 


(A-9) 


Xj • Wi(t|vj)^ - »i«iCVi + (tl»J)^ - 2 i«i(tltj)] 

♦ *2*2^'^2 * ^ 

♦l • Oj[(l - ttj) t + »jOjmj] + (1 - Oj)C-ajt ♦ fjOjmj], 


where 
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Slnllarlly, the covariance matrix £2 ^ obtained as 

1*2 • ^ ® 2^1 - « 2 )M + (»2 

^♦2& ^ll^®/*2 




(A-10) 


where 


X 2 ■ * ^fz ^^2 * * 2 

♦ »l«lCVi ♦ (tA^)^ ♦ 2 mj(i/«2)] 

♦2 ■ ®2^^^ - aj) t - W2“^2^ - (1 - «5) C«Jt ♦ 


In the discriminant function, we use the pooled covariance matrix which Is an 


estimate of the weighted covariance matrix, r* « ■*'■^2* given by 

£ » £ + + x;n + ♦ (6j^ *lt ^ (4-11) 


where 


n ■ oj (1 - oj) tfj + og (1 - 02) »2 
X ■ Xj + X 2 

♦ ■ ♦! + ♦ 2 * 


with 6, t and a*'s are as defined In (A>5). In obtaining (A- 11), we have made 

0 '2 

use of the fact that Z *y y 0 ■ £. 


In the case of canonical form, the mean vectors for C*, 1 - 1, 2, and the 
weighted covariance matrix are 
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)i\ - c-(l - 2 «I)(A/ 2 ) ♦ t/»J] 

H 2 - C(1 - 2 o*)(A/2) - t/»J] tj 

£* ■ I ♦ CJisl 

where 

Sj ■ (1» 0, •••, 0) 

C ■ nA^ X ♦ 2 A ♦. 

These expressions are (Stained from (A-6) and (A>11) by recognizing that 
6i • A, Yi » 1 and ■ 1» £ ■ it «nd ■ 0 «nd Yj « 0, J ■ 2, 3, ••• . p. 
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APPENDIX B 

Oerivition of JJ« 

£ 




wMMiguouw 


Let 2 ^^^ ■ JO and • (« 22 ,® 23 , ***• ®pp)» ***®'‘* ®22» ®23* **** ®pp 
alsments of tha upptr triangular Matrix of £ with Its first row axcludad. 

Suppose 2*^^^ Is th« first row of and vector of eleaients of 

the upper triangular matrix* less of £*“'. In the determination of J(«, 

£ 

there Is no need to consider and o*^^^ ; e.g.» refer to Lemma 2 In Efron 


* * 


(1975). Suppose c ■ log w^/xg. t ■ ^09 «i/v 2 » •''** 

1 " U» ti» 1*2* 2^^^) 

2 ■ (®j. «2» “l™!* ®2*"2^ 


(B-1) 


and 




(B-2) 


Then by the 6-method (Rao, 1973)* we have 

* ♦ ^ ^ 
3$ 38 36 3g 


4 






where 


omoiHW* 


l‘r , « 

t - 

' f ♦• 

Tht eleoients of con be obtained by evaluating the asymptotic variances for 

i 

the maximum likelihood estimates of Restricting ourselves to the case of 
canonical form, we have the following asymptotic variances of and 

" *ii,: < • i. 2 

and their asympotic covariance zero, where Determination of 

V. and V.. would require the ml sal location model to be specified. We skip the 
S Ss 

specifics and sketch the main steps Involved In obtaining these matrices. 

Define the random variable y by 

!0, Sample observation ^ Is correctly allocated 
1, Sample observation X Is ml sal located 

If XcC^ , then It can be '»en from (2.2) and (A-4) that E[y] ■ , 

V[y] • (1 - E[yz] - and E[(yz)2] . E[yz2] - (say). 

So the asymptotic elements of V. are given by 

a 

n V[a^] ■ V O] « a^(l - a^) 

n V[a^m^] - v[yz] • - (a^m^)^ (B-5) 

n Cov [a^.o^m^] ■ Cov [y, yz] ■ o^(l - )m^ 


1 • 1 , 2 
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Noting th«t thtst vorlibUs art Indtptndtnt for and C 2 » all oltawnts 

art obtaintd In (B>5). Ntxt» May bt dtrivtd by tht ust of a^tbod, 

Dtnott a • a(ft). Than dg • ('j|‘)d® and dt(dg) • de(de) (— ). 


Thus 


V.. - E[dO(da) ] - V. (™) 
Oo 6 *• 


It can bt shown that 


9a« 


'll 




a®, 


'21 




’^nr > ‘ 


^^2 

•*11 


0 . 


0 . 


da^rnj 


'21 




J77,* ‘2\< 




^ • ®2("»2^^- U 


3o 


2' 2 


where 


■."1^’ • f 


(z)t(z)dz 


of V« 
2 


( 8 - 6 ) 


(B-7) 


which can be easily evaluated by specifying g^(z), 1 ■ I, 2 



- ‘ ‘ 1 .}• 
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dB 

Though the matrices and are someMhat comp1ex« their derivations are 

»e ** 

fairly straight forward. These are as follows: 


96 


!!o ■ 


1 

!u_e' 

-^e! 

1/ *^\ ^ 
■ ■ “ll^ 

96 


rrrSi 

TTTSi 

9b(1) 

• 

0 

*i ^ t4t ~11 

i ‘ TTT ~U 

(“21 - “U)i 

W 

96 




96 

w 


96 


38 

96 

9g 


(B-8) 


(B-9) 


where 


96 

9^ 


s 

s 

to 

£ 

5 

(1 - «l)i 

a«I 


.5 


(1 - »2)I 

(t^gU 

A 

3o*(l) 


9o*(i) 

0 

3 Hi 

CM 



with 
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2(nA * If) 

(1 



> ( "* * *1 
I ♦ i 


a 



aj*(‘) ^*( 1 ) 


onewM.FMK^ 

S?wor< 1 M»^ 




- * X H) g 

( 1 + 0 ^ ~11 


n ^ X ^ »A) 
1 + c 


(I - E 


u 


) 


*1*2 

*2 

*1*2 

0 

0 

^(^vn)ei 

*1 

*2 /A * . 

7^7-‘*ll)5i 

*1 

*1 - 

‘7*1 

*1 

4.. 

*1 

*1 / A ^ * . 

"7^ 7* “21^51 
*2 

*2 / A * . 

'7^7-“21^5i 
*2 

*1 e 

7®i 

*2 

.!|e, 

*2 

*1 

*2 

2 Hi 


(1 + E)2 

(1 +o^ 

(1 *o^ 

(1 *c)^ 

- %'i 


«<»21 - "Ii) «i 

>(»2l - »ti) 


|S_ . i2(l - 2.J . ,2 t .2) » 2 4t(.j.* ♦ (I - .2>.J)/,J,* 
* t2(.;2 . .;2,/.*2.*3 

11^ ■ 4^(1 - a.j ♦ . ,|) * 2 ^ (j _ t*)/.J.J 
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